Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN)

نویسنده

  • Alp Sardag
چکیده

A real world environment is often partially observable by the agents either because of noisy sensors or incomplete perception. Moreover, it has continuous state space in nature, and agents must decide on an action for each point in internal continuous belief space. Consequently, it is convenient to model this type of decision-making problems as Partially Observable Markov Decision Processes (POMDPs) with continuous observation and state space. Most of the POMDP methods whether approximate or exact assume that the underlying world dynamics or POMDP parameters such as transition and observation probabilities are known. However, for many real world environments it is very difficult if not impossible to obtain such information. We assume that only the internal dynamics of the agent, such as the actuator noise, interpretation of the sensor suite, are known. Using these internal dynamics, our algorithm, namely Kalman Based Temporal Difference Neural Network (KBTDNN), generates an approximate optimal policy in a continuous belief state space. The policy over continuous belief state space is represented by a temporal difference neural network. KBTDNN deals with continuous Gaussian-based POMDPs. It makes use of Kalman Filter for belief state estimation. Given only the MDP reward and the internal dynamics of the agent, KBTDNN can automatically construct the approximate optimal policy without the need for discretization of the state and observation space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Robust Kalman Filter

A Robust Markov Decision Process (RMDP) is a sequential decision making model that accounts for uncertainty in the parameters of dynamic systems. This uncertainty introduces difficulties in learning an optimal policy, especially for environments with large state spaces. We propose two algorithms, RTD-DQN and Deep-RoK, for solving large-scale RMDPs using nonlinear approximation schemes such as d...

متن کامل

Infrared Counter-Countermeasure Efficient Techniques using Neural Network, Fuzzy System and Kalman Filter

This paper presents design and implementation of three new Infrared Counter-Countermeasure (IRCCM) efficient methods using Neural Network (NN), Fuzzy System (FS), and Kalman Filter (KF). The proposed algorithms estimate tracking error or correction signal when jamming occurs. An experimental test setup is designed and implemented for performance evaluation of the proposed methods. The methods v...

متن کامل

Two-stage Stochastic Programing Based on the Accelerated Benders Decomposition for Designing Power Network Design under Uncertainty

In this paper, a comprehensive mathematical model for designing an electric power supply chain network via considering preventive maintenance under risk of network failures is proposed. The risk of capacity disruption of the distribution network is handled via using a two-stage stochastic programming as a framework for modeling the optimization problem. An applied method of planning for the net...

متن کامل

Sensorless Speed Control of Double Star Induction Machine With Five Level DTC Exploiting Neural Network and Extended Kalman Filter

This article presents a sensorless five level DTC control based on neural networks using Extended Kalman Filter (EKF) applied to Double Star Induction Machine (DSIM). The application of the DTC control brings a very interesting solution to the problems of robustness and dynamics. However, this control has some drawbacks such as the uncontrolled of the switching frequency and the strong ripple t...

متن کامل

Modeling and Spatio-Temporal Analysis of the Distribution of O3 in Tehran City Based on Neural Network and Spatial Analysis in GIS Environment

Air pollution is one of the most problems that people are facing today in metropolitan areas. Suspended particulates, carbon monoxide, sulfur dioxide, ozone and nitrogen dioxide are the five major pollutants of air that pose many problems to human health. The goal of this study is to propose a spatial approach for estimation and analyzing the spatial and temporal distribution of ozone based on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006